14 research outputs found
Analyzing establishment nonresponse using an interpretable regression tree model with linked administrative data
To gain insight into how characteristics of an establishment are associated
with nonresponse, a recursive partitioning algorithm is applied to the
Occupational Employment Statistics May 2006 survey data to build a regression
tree. The tree models an establishment's propensity to respond to the survey
given certain establishment characteristics. It provides mutually exclusive
cells based on the characteristics with homogeneous response propensities. This
makes it easy to identify interpretable associations between the characteristic
variables and an establishment's propensity to respond, something not easily
done using a logistic regression propensity model. We test the model obtained
using the May data against data from the November 2006 Occupational Employment
Statistics survey. Testing the model on a disjoint set of establishment data
with a very large sample size offers evidence that the regression
tree model accurately describes the association between the establishment
characteristics and the response propensity for the OES survey. The accuracy of
this modeling approach is compared to that of logistic regression through
simulation. This representation is then used along with frame-level
administrative wage data linked to sample data to investigate the possibility
of nonresponse bias. We show that without proper adjustments the nonresponse
does pose a risk of bias and is possibly nonignorable.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS521 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bayesian Estimation Under Informative Sampling
Bayesian analysis is increasingly popular for use in social science and other
application areas where the data are observations from an informative sample.
An informative sampling design leads to inclusion probabilities that are
correlated with the response variable of interest. Model inference performed on
the observed sample taken from the population will be biased for the population
generative model under informative sampling since the balance of information in
the sample data is different from that for the population. Typical approaches
to account for an informative sampling design under Bayesian estimation are
often difficult to implement because they require re-parameterization of the
hypothesized generating model, or focus on design, rather than model-based,
inference. We propose to construct a pseudo-posterior distribution that
utilizes sampling weights based on the marginal inclusion probabilities to
exponentiate the likelihood contribution of each sampled unit, which weights
the information in the sample back to the population. Our approach provides a
nearly automated estimation procedure applicable to any model specified by the
data analyst for the population and retains the population model
parameterization and posterior sampling geometry. We construct conditions on
known marginal and pairwise inclusion probabilities that define a class of
sampling designs where consistency of the pseudo posterior is
guaranteed. We demonstrate our method on an application concerning the Bureau
of Labor Statistics Job Openings and Labor Turnover Survey.Comment: 24 pages, 3 figure
Adding interior points to an existing Brownian sheet lattice
We compute the conditional distribution of new interior points of a given a lattice representing a path of a Brownian sheet process in discrete time. This is done so that we can simulate paths of this multi-parameter Gaussian process by refining previously simulated paths, which allows one to refine a particular area of the path that is of interest.Brownian sheet Simulation Conditional distribution